Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: introduce benchmarking for rust #4552

Merged
merged 1 commit into from
Jan 7, 2025

Conversation

benfdking
Copy link
Collaborator

  • introduces a criterion based bench based on the long sql
  • introduces serde serializing and deserializing that is not built into the main release
  • introduces some json files which are just simple serialization of a set of configs taken from sqlglot
  • introduces gha action that compares performance in a pr to main to check for regressions

@benfdking benfdking force-pushed the introducing_benchmarking branch from 1874140 to c3388f2 Compare December 30, 2024 14:15
@georgesittas
Copy link
Collaborator

georgesittas commented Jan 7, 2025

FYI, the numbers (values) in the setting maps can change, depending on the order of the relevant enums (e.g., token types) in Python. I'm wondering if it's a good idea to check the JSON files into the repo, given that we'll have to re-generate them every time we benchmark, to make sure the numbers aren't stale. Or am I missing something?

@benfdking benfdking force-pushed the introducing_benchmarking branch from c3388f2 to 8e8239e Compare January 7, 2025 17:21
Copy link
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: #4552 (comment) (sharing internal discussion).

It’s probably fine for now. I think if the TokenType enum is updated, we'd simply miss the newer tokens and use an outdated mapping, but that’s fine because we don’t use those tokens anyway and we're only using the rust tokenizer directly. So, it’s like taking a snapshot of the tokens supported today and benchmarking the tokenizer using them.

@georgesittas georgesittas merged commit 9921528 into tobymao:main Jan 7, 2025
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants